setwd("D:/Outline/Webinar-1")

Anatomy of a Story:

Visualisation:

Types:

  1. Based on number of variables / characters (univariate / multivariate)

  2. Static or interactive

In this brief presentation we shall get glimpse of each one of them.

Story from Data:

In the following sections, we will start will very simple datatset and add complexities to that.

First Story

In the following section, we shall delving into a very small exercise.

income <- read.csv("income.csv")
str(income)
## 'data.frame':    1192 obs. of  6 variables:
##  $ earn  : num  50000 60000 30000 50000 51000 9000 29000 32000 2000 27000 ...
##  $ height: num  74.4 65.5 63.6 63.1 63.4 ...
##  $ sex   : Factor w/ 2 levels "female","male": 2 1 1 1 1 1 1 2 2 2 ...
##  $ ed    : int  16 16 16 16 17 15 12 17 15 12 ...
##  $ age   : int  45 58 29 91 39 26 49 46 21 26 ...
##  $ race  : Factor w/ 4 levels "black","hispanic",..: 4 4 4 3 4 4 4 4 2 4 ...
summary(income)
##       earn            height          sex            ed            age       
##  Min.   :   200   Min.   :57.50   female:687   Min.   : 3.0   Min.   :18.00  
##  1st Qu.: 10000   1st Qu.:64.01   male  :505   1st Qu.:12.0   1st Qu.:29.00  
##  Median : 20000   Median :66.45                Median :13.0   Median :38.00  
##  Mean   : 23155   Mean   :66.92                Mean   :13.5   Mean   :41.38  
##  3rd Qu.: 30000   3rd Qu.:69.85                3rd Qu.:16.0   3rd Qu.:51.00  
##  Max.   :200000   Max.   :77.05                Max.   :18.0   Max.   :91.00  
##        race    
##  black   :112  
##  hispanic: 66  
##  other   : 25  
##  white   :989  
##                
## 

So we have 6 charcters which are self explanatory - only that ed refer to “education”.

We are considering “income” as a phenomena for our study. Therefore, “height” variable is not being considered.

ggplot(data = income, aes(x = race)) + geom_bar(aes(fill = race))

ggplot(data = income, aes(x = sex)) + geom_bar(aes(fill = sex))

ggplot(data = income, aes(x = sex)) + geom_bar(aes(fill = sex)) + facet_wrap(income$race)

All the figures talks about the compostion of the data in terms of race and gender. Observations are as follows -

  1. Overwhelming presence of “white” people.

  2. There are more females compared to males

  3. Almost equal number people from each race have been involved in the study.

Please note all diagrames are static in nature, they are not interactive in nature. In the next section we would like to understand the demographic profile and will also experience “interactive” charts.

# First example of interactive plot

income %>% plot_ly(x = ~race) %>%  add_histogram(color=~sex) %>% group_by(race, sex) %>%
  summarise(n = n())
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# Examining demographic profiles

table1 <- income %>% group_by(race,sex) %>% summarize(avg.income = mean(earn), Edu_Years = mean(ed),Nos = n())

colnames(table1) <- c("Race","Gender", "Avg.Income", "Avg.Education","Nos")

# Examining - educational profile - race and geder

plot_ly(income,  y = ~ed, color = ~race, type = "box")
plot_ly(table1, x = ~Race, y = ~Avg.Education, color = ~Race, type = "bar")
plot_ly(table1, x = ~Race, y = ~Avg.Education, color = ~Gender, type = "bar")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

In the next section - we will explore the income profile.

plot_ly(data = income, x = ~race, y = ~earn, color  = ~race, type = "box")
plot_ly(data = income, x = ~sex, y = ~earn, color  = ~sex, type = "box")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
plot_ly(data = table1, x = ~Race, y = ~Avg.Income, color  = ~Race, type = "bar")
plot_ly(data = table1, x = ~Gender, y = ~Avg.Income, color  = ~Gender, type = "bar")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
plot_ly(data = table1, x = ~Race, y = ~Avg.Income, color = ~Gender, size = ~Avg.Income, type = "bar")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

In terms of income disparity following things can be noted -

  1. Within “whites” there is a section of people whose income is significantly greater than others.

  2. White people have the highest average income and hispanic earns lowest.

  3. Scores of males who earns more than females.

  4. The average income of females are greater than males ONLY for hispanic. For rest, average income of males are greater

In the following section we shall take a unified view - including educaiton and income.

ggplot(data = income, aes(x = ed, y = earn))  + geom_point(aes(color = sex)) + facet_wrap(income$race) + geom_smooth(method = "lm")

ggplot(data = income, aes(x = ed, y = earn))  + geom_point(aes(color = sex, size = ed)) + facet_wrap(income$sex) + geom_smooth(method = "lm")

coplot(earn ~ ed | race*sex, data = income, panel = panel.smooth)

plot_ly(data = income, x = ~ed, y = ~earn, type = "scatter",color = ~sex,frame = ~cut(ed,10), size = 5)
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Points to be noted:

Now we will start with story No-2: